skip to main content


Search for: All records

Creators/Authors contains: "Lim, Yongwan"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Abstract

    Real-time magnetic resonance imaging (RT-MRI) of human speech production is enabling significant advances in speech science, linguistics, bio-inspired speech technology development, and clinical applications. Easy access to RT-MRI is however limited, and comprehensive datasets with broad access are needed to catalyze research across numerous domains. The imaging of the rapidly moving articulators and dynamic airway shaping during speech demands high spatio-temporal resolution and robust reconstruction methods. Further, while reconstructed images have been published, to-date there is no open dataset providing raw multi-coil RT-MRI data from an optimized speech production experimental setup. Such datasets could enable new and improved methods for dynamic image reconstruction, artifact correction, feature extraction, and direct extraction of linguistically-relevant biomarkers. The present dataset offers a unique corpus of 2D sagittal-view RT-MRI videos along with synchronized audio for 75 participants performing linguistically motivated speech tasks, alongside the corresponding public domain raw RT-MRI data. The dataset also includes 3D volumetric vocal tract MRI during sustained speech sounds and high-resolution static anatomical T2-weighted upper airway MRI for each participant.

     
    more » « less
  2. Level of Evidence

    5

    Technical Efficacy Stage

    1

     
    more » « less
  3. Purpose

    To develop and evaluate a fast and effective method for deblurring spiral real‐time MRI (RT‐MRI) using convolutional neural networks.

    Methods

    We demonstrate a 3‐layer residual convolutional neural networks to correct image domain off‐resonance artifacts in speech production spiral RT‐MRI without the knowledge of field maps. The architecture is motivated by the traditional deblurring approaches. Spatially varying off‐resonance blur is synthetically generated by using discrete object approximation and field maps with data augmentation from a large database of 2D human speech production RT‐MRI. The effect of off‐resonance range, shift‐invariance of blur, and readout durations on deblurring performance are investigated. The proposed method is validated using synthetic and real data with longer readouts, quantitatively using image quality metrics and qualitatively via visual inspection, and with a comparison to conventional deblurring methods.

    Results

    Deblurring performance was found superior to a current autocalibrated method for in vivo data and only slightly worse than an ideal reconstruction with perfect knowledge of the field map for synthetic test data. Convolutional neural networks deblurring made it possible to visualize articulator boundaries with readouts up to 8 ms at 1.5 T, which is 3‐fold longer than the current standard practice. The computation time was 12.3 ± 2.2 ms per frame, enabling low‐latency processing for RT‐MRI applications.

    Conclusion

    Convolutional neural networks deblurring is a practical, efficient, and field map‐free approach for the deblurring of spiral RT‐MRI. In the context of speech production imaging, this can enable 1.7‐fold improvement in scan efficiency and the use of spiral readouts at higher field strengths such as 3 T.

     
    more » « less
  4. Purpose

    To provide 3D real‐time MRI of speech production with improved spatio‐temporal sharpness using randomized, variable‐density, stack‐of‐spiral sampling combined with a 3D spatio‐temporally constrained reconstruction.

    Methods

    We evaluated five candidate (k,t) sampling strategies using a previously proposed gradient‐echo stack‐of‐spiral sequence and a 3D constrained reconstruction with spatial and temporal penalties. Regularization parameters were chosen by expert readers based on qualitative assessment. We experimentally determined the effect of spiral angle increment andkztemporal order. The strategy yielding highest image quality was chosen as the proposed method. We evaluated the proposed and original 3D real‐time MRI methods in 2 healthy subjects performing speech production tasks that invoke rapid movements of articulators seen in multiple planes, using interleaved 2D real‐time MRI as the reference. We quantitatively evaluated tongue boundary sharpness in three locations at two speech rates.

    Results

    The proposed data‐sampling scheme uses a golden‐angle spiral increment in thekxkyplane and variable‐density, randomized encoding alongkz. It provided a statistically significant improvement in tongue boundary sharpness score (P < .001) in the blade, body, and root of the tongue during normal and 1.5‐times speeded speech. Qualitative improvements were substantial during natural speech tasks of alternating high, low tongue postures during vowels. The proposed method was also able to capture complex tongue shapes during fast alveolar consonant segments. Furthermore, the proposed scheme allows flexible retrospective selection of temporal resolution.

    Conclusion

    We have demonstrated improved 3D real‐time MRI of speech production using randomized, variable‐density, stack‐of‐spiral sampling with a 3D spatio‐temporally constrained reconstruction.

     
    more » « less
  5. Purpose

    To mitigate a common artifact in spiral real‐time MRI, caused by aliasing of signal outside the desired FOV. This artifact frequently occurs in midsagittal speech real‐time MRI.

    Methods

    Simulations were performed to determine the likely origin of the artifact. Two methods to mitigate the artifact are proposed. The first approach, denoted as “large FOV” (LF), keeps an FOV that is large enough to include the artifact signal source during reconstruction. The second approach, denoted as “estimation‐subtraction” (ES), estimates the artifact signal source before subtracting a synthetic signal representing that source in multicoil k‐space raw data. Twenty‐five midsagittal speech‐production real‐time MRI data sets were used to evaluate both of the proposed methods. Reconstructions without and with corrections were evaluated by two expert readers using a 5‐level Likert scale assessing artifact severity. Reconstruction time was also compared.

    Results

    The origin of the artifact was found to be a combination of gradient nonlinearity and imperfect anti‐aliasing in spiral sampling. The LF and ES methods were both able to substantially reduce the artifact, with an averaged qualitative score improvement of 1.25 and 1.35 Likert levels for LF correction and ES correction, respectively. Average reconstruction time without correction, with LF correction, and with ES correction were 160.69 ± 1.56, 526.43 ± 5.17, and 171.47 ± 1.71 ms/frame.

    Conclusion

    Both proposed methods were able to reduce the spiral aliasing artifacts, with the ES‐reduction method being more effective and more time efficient.

     
    more » « less
  6. Purpose

    To improve the depiction and tracking of vocal tract articulators in spiral real‐time MRI (RT‐MRI) of speech production by estimating and correcting for dynamic changes in off‐resonance.

    Methods

    The proposed method computes a dynamic field map from the phase of single‐TE dynamic images after a coil phase compensation where complex coil sensitivity maps are estimated from the single‐TE dynamic scan itself. This method is tested using simulations and in vivo data. The depiction of air–tissue boundaries is evaluated quantitatively using a sharpness metric and visual inspection.

    Results

    Simulations demonstrate that the proposed method provides robust off‐resonance correction for spiral readout durations up to 5 ms at 1.5T. In ‐vivo experiments during human speech production demonstrate that image sharpness is improved in a majority of data sets at air–tissue boundaries including the upper lip, hard palate, soft palate, and tongue boundaries, whereas the lower lip shows little improvement in the edge sharpness after correction.

    Conclusion

    Dynamic off‐resonance correction is feasible from single‐TE spiral RT‐MRI data, and provides a practical performance improvement in articulator sharpness when applied to speech production imaging.

     
    more » « less
  7. Purpose

    To develop and evaluate a technique for 3D dynamic MRI of the full vocal tract at high temporal resolution during natural speech.

    Methods

    We demonstrate 2.4 × 2.4 × 5.8 mm3spatial resolution, 61‐ms temporal resolution, and a 200 × 200 × 70 mm3FOV. The proposed method uses 3D gradient‐echo imaging with a custom upper‐airway coil, a minimum‐phase slab excitation, stack‐of‐spirals readout, pseudo golden‐angle view order inkxky, linear Cartesian order alongkz, and spatiotemporal finite difference constrained reconstruction, with 13‐fold acceleration. This technique is evaluated using in vivo vocal tract airway data from 2 healthy subjects acquired at 1.5T scanner, 1 with synchronized audio, with 2 tasks during production of natural speech, and via comparison with interleaved multislice 2D dynamic MRI.

    Results

    This technique captured known dynamics of vocal tract articulators during natural speech tasks including tongue gestures during the production of consonants “s” and “l” and of consonant–vowel syllables, and was additionally consistent with 2D dynamic MRI. Coordination of lingual (tongue) movements for consonants is demonstrated via volume‐of‐interest analysis. Vocal tract area function dynamics revealed critical lingual constriction events along the length of the vocal tract for consonants and vowels.

    Conclusion

    We demonstrate feasibility of 3D dynamic MRI of the full vocal tract, with spatiotemporal resolution adequate to visualize lingual movements for consonants and vocal tact shaping during natural productions of consonant–vowel syllables, without requiring multiple repetitions.

     
    more » « less